468 research outputs found
Integrating a Non-Uniformly Sampled Software Retina with a Deep CNN Model
We present a biologically inspired method for pre-processing images applied to CNNs
that reduces their memory requirements while increasing their invariance to scale and rotation
changes. Our method is based on the mammalian retino-cortical transform: a
mapping between a pseudo-randomly tessellated retina model (used to sample an input
image) and a CNN. The aim of this first pilot study is to demonstrate a functional retinaintegrated
CNN implementation and this produced the following results: a network using
the full retino-cortical transform yielded an F1 score of 0.80 on a test set during a 4-way
classification task, while an identical network not using the proposed method yielded an
F1 score of 0.86 on the same task. The method reduced the visual data by e×7, the input
data to the CNN by 40% and the number of CNN training epochs by 64%. These results
demonstrate the viability of our method and hint at the potential of exploiting functional
traits of natural vision systems in CNNs
Object Edge Contour Localisation Based on HexBinary Feature Matching
This paper addresses the issue of localising object
edge contours in cluttered backgrounds to support robotics
tasks such as grasping and manipulation and also to improve
the potential perceptual capabilities of robot vision systems. Our
approach is based on coarse-to-fine matching of a new recursively
constructed hierarchical, dense, edge-localised descriptor,
the HexBinary, based on the HexHog descriptor structure first
proposed in [1]. Since Binary String image descriptors [2]–
[5] require much lower computational resources, but provide
similar or even better matching performance than Histogram
of Orientated Gradient (HoG) descriptors, we have replaced
the HoG base descriptor fields used in HexHog with Binary
Strings generated from first and second order polar derivative
approximations. The ALOI [6] dataset is used to evaluate
the HexBinary descriptors which we demonstrate to achieve
a superior performance to that of HexHoG [1] for pose
refinement. The validation of our object contour localisation
system shows promising results with correctly labelling ~86% of edgel positions and mis-labelling ~3%
A Portable Active Binocular Robot Vision Architecture for Scene Exploration
We present a portable active binocular robot vision archi-
tecture that integrates a number of visual behaviours. This vision archi-
tecture inherits the abilities of vergence, localisation, recognition and si-
multaneous identification of multiple target object instances. To demon-
strate the portability of our vision architecture, we carry out qualitative
and comparative analysis under two different hardware robotic settings,
feature extraction techniques and viewpoints. Our portable active binoc-
ular robot vision architecture achieved average recognition rates of 93.5%
for fronto-parallel viewpoints and, 83% percentage for anthropomorphic
viewpoints, respectively
Interactive Perception based on Gaussian Process Classification Applied to Household Object Recognition & Sorting
No abstract available
Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting
We present an interactive perception model for
object sorting based on Gaussian Process (GP) classification
that is capable of recognizing objects categories from point
cloud data. In our approach, FPFH features are extracted from
point clouds to describe the local 3D shape of objects and
a Bag-of-Words coding method is used to obtain an object-level
vocabulary representation. Multi-class Gaussian Process
classification is employed to provide and probable estimation of
the identity of the object and serves a key role in the interactive
perception cycle – modelling perception confidence. We show
results from simulated input data on both SVM and GP based
multi-class classifiers to validate the recognition accuracy of our
proposed perception model. Our results demonstrate that by
using a GP-based classifier, we obtain true positive classification
rates of up to 80%. Our semi-autonomous object sorting
experiments show that the proposed GP based interactive
sorting approach outperforms random sorting by up to 30%
when applied to scenes comprising configurations of household
objects
On the Calibration of Active Binocular and RGBD Vision Systems for Dual-Arm Robots
This paper describes a camera and hand-eye
calibration methodology for integrating an active binocular
robot head within a dual-arm robot. For this purpose, we
derive the forward kinematic model of our active robot head
and describe our methodology for calibrating and integrating
our robot head. This rigid calibration provides a closedform
hand-to-eye solution. We then present an approach for
updating dynamically camera external parameters for optimal
3D reconstruction that are the foundation for robotic tasks such
as grasping and manipulating rigid and deformable objects. We
show from experimental results that our robot head achieves
an overall sub millimetre accuracy of less than 0.3 millimetres
while recovering the 3D structure of a scene. In addition, we
report a comparative study between current RGBD cameras
and our active stereo head within two dual-arm robotic testbeds
that demonstrates the accuracy and portability of our proposed
methodology
Egocentric Perception using a Biologically Inspired Software Retina Integrated with a Deep CNN
We presented the concept of of a software retina, capable
of significant visual data reduction in combination with
scale and rotation invariance, for applications in egocentric
and robot vision at the first EPIC workshop in Amsterdam
[9]. Our method is based on the mammalian retino-cortical
transform: a mapping between a pseudo-randomly tessellated
retina model (used to sample an input image) and a
CNN. The aim of this first pilot study is to demonstrate a
functional retina-integrated CNN implementation and this
produced the following results: a network using the full
retino-cortical transform yielded an F1 score of 0.80 on a
test set during a 4-way classification task, while an identical
network not using the proposed method yielded an F1
score of 0.86 on the same task. On a 40K node retina the
method reduced the visual data bye×7, the input data to the
CNN by 40% and the number of CNN training epochs by
36%. These results demonstrate the viability of our method
and hint at the potential of exploiting functional traits of
natural vision systems in CNNs. In addition, to the above
study, we present further recent developments in porting
the retina to an Apple iPhone, an implementation in CUDA
C for NVIDIA GPU platforms and extensions of the retina
model we have adopted
Glasgow's Stereo Image Database of Garments
To provide insight into cloth perception and manipulation with an active
binocular robotic vision system, we compiled a database of 80 stereo-pair
colour images with corresponding horizontal and vertical disparity maps and
mask annotations, for 3D garment point cloud rendering has been created and
released. The stereo-image garment database is part of research conducted under
the EU-FP7 Clothes Perception and Manipulation (CloPeMa) project and belongs to
a wider database collection released through CloPeMa (www.clopema.eu). This
database is based on 16 different off-the-shelve garments. Each garment has
been imaged in five different pose configurations on the project's binocular
robot head. A full copy of the database is made available for scientific
research only at https://sites.google.com/site/ugstereodatabase/.Comment: 7 pages, 6 figure, image databas
A Software Retina for Egocentric & Robotic Vision Applications on Mobile Platforms
We present work in progress to develop a low-cost highly
integrated camera sensor for egocentric and robotic vision. Our underlying
approach is to address current limitations to image analysis by Deep
Convolutional Neural Networks, such as the requirement to learn simple
scale and rotation transformations, which contribute to the large computational
demands for training and opaqueness of the learned structure,
by applying structural constraints based on known properties of the human
visual system. We propose to apply a version of the retino-cortical
transform to reduce the dimensionality of the input image space by a
factor of ex100, and map this spatially to transform rotations and scale
changes into spatial shifts. By reducing the input image size accordingly,
and therefore learning requirements, we aim to develop compact and
lightweight egocentric and robot vision sensor using a smartphone as the
target platfor
- …